Quantifying biodiversity starts with a sample (or multiple samples) of a biological assemblage. The size (e.g., area or volume) of the sample is often referred to as the sample grain. For example, we might use a 1m2 quadrat to sample the benthic community on a coral reef.


Or use a line intercept transect of a set length (e.g., 50m).

Then, choose a metric…but, there are lots to choose from!

The proliferation of metrics is likely due to the multicomponent nature of biodiversity. Different metrics describe (incorporate) three different components - abundance, richness, and evenness - to varying degrees (McGill 2011b). For diversity considered at only a single scale only, changes in these three components combine to determine variation in biodiversity. We want to choose metrics that allow us to understand and accurately describe how numbers of individuals, as well as the richness and relative abundance of species covary.

Biodiversity through the lens of Individual-Based Rarefaction (IBR) curves

library(tidyverse)
library(mobsim)
library(mobr)
library(cowplot)

# set a RNG seed 
set.seed(42)

# simulate a sample with 20 species and 200 individuals, and 
# a lognormal SAD
sim_poisson_comm <- sim_poisson_community(s_pool = 20,
                                          n_sim = 200,
                                          sad_type = 'lnorm',
                                          sad_coef = list('meanlog' = log(200/20),
                                                          'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>% 
  as_tibble()

# need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve)
comm1_long <- comm1 %>% 
  group_by(species) %>% 
  summarise(N = n())

# calculate IBR, and put it in a tibble (dataframe) for plotting  
comm1_ibr <- tibble(expected_richness = rarefaction(comm1_long$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# calculate the Probability of Interspecific Encounter (PIE)
comm1_PIE <- tibble(PIE = calc_div(comm1_long$N, index = 'PIE'))

# plot the sample and the IBR
comm1_map <- ggplot() +
  geom_point(data = comm1,
             aes(x = x, y = y, colour = species)) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = '#ff0000'))

comm1_ibr_plot <- ggplot() +
  geom_line(data = comm1_ibr,
            aes(x = individuals, y = expected_richness),
            linewidth = 1.1) +
  geom_point(aes(x = 1, y = 1),
             size = 2) +
  geom_text(aes(x = 35, y = 1, label = '{J = 1, S = 1}'),) +
  geom_point(data = comm1_ibr,
             aes(x = max(individuals), y = max(expected_richness)),
             size = 2) +
  geom_text(data = comm1_ibr %>% 
              # filter so as we only have the last row
              filter(individuals==max(individuals)),
            aes(x = individuals - 35, y = expected_richness - 3,
                label = paste0('{J = ', max(individuals), ', S = ',
                              max(expected_richness), '}'))) +
  # add a line to show PIE; I've got it going through the origin here,
  # but it is the slope of the IBR as we move from one individual to two
  geom_abline(data = comm1_PIE,
              aes(intercept = 0, slope = PIE),
              linewidth = 0.75, alpha = 0.5, linetype = 2) +
  geom_text(data = comm1_PIE,
            aes(x = 35, y = 17, label = paste0('PIE = ', round(PIE,2)))) +
  labs(x = 'Total number of individuals (J)',
       y = 'Expected species richness (S)') +
  theme_bw() 


plot_grid(comm1_map,
          comm1_ibr_plot, 
          nrow = 1)

We can visualise the key components of individuals, richness and evenness using the individual-based rarefaction curve. Numbers of individuals are on the x-axis, and the number of species on the y-axis. Evenness, or the relative abundance of species, is described by the shape of the curve. In particular, the slope at the base of the rarefaction curve, as it moves from one individual to two, is equal to the Probability of Interspecific Encounter (PIE;(Hurlbert 1971; Olszewski 2004). In practice, PIE is often transformed to an Effective Number of Species (ENS) to aid interpretation (L. Jost 2006). The ENS transformation of PIE, which we refer to as \(S_{PIE}\), is equal to the inverse of Simpson’s index (\(S_{PIE} = 1/\sum_{i=1}^{S} p_i^2\)), where \(p_i\) is the relative abundance of species i. \(S_{PIE}\) is also equal to Hill number of diversity index with order equal to two (Hill 1973; Lou Jost 2007).

Exercises

Simulate some data to investigate covariation in abundance, evenness and richness.

#--------set parameters for community abundance, richness and evenness-----
community_params <- expand_grid(
  # number of species
  Spool = c(10, 40, 80), 
  # total number of individuals
  J = c(50, 500),
  # evenness of species relative (log) abundance
  sdlog = c(0.8,2)) %>% 
  mutate(
    # species mean (log) abundance
    meanlog = log(J/Spool)
  ) %>% 
  # create identifier
  mutate(region = 1:n()) 


#--------simulate communities-------------------
communities <- community_params %>% 
  group_by(region) %>%
  nest(data = c(Spool, J, sdlog, meanlog)) %>%
  # create some replicates (multiple samples with the same parameters)
  uncount(20, .remove = FALSE) %>% 
  ungroup() %>% 
  mutate(sim = 1:n()) %>% 
  # generate sample with individuals randomly distributed in space
  # i.e., spatial sampling is poisson process
  mutate(poisson_comm = map(data, 
                            ~sim_poisson_community(
                              s_pool = .x$Spool,
                              n_sim = .x$J,
                              sad_type = 'lnorm',
                              sad_coef = list('meanlog' = .x$meanlog,
                                              'sdlog'= .x$sdlog)))) %>% 
  # prepare stem map for visualisation 
  mutate(poisson_stem_map = map(poisson_comm, 
                                ~tibble(x = .x$census$x,
                                        y = .x$census$y,
                                        species = .x$census$species))) %>% 
  # convert sample to SAD for further calculations
  mutate(poisson_sad = map(poisson_comm, ~community_to_sad(.x)),
         # to 2d
         poisson_sad = map(poisson_sad, ~tibble(species = names(.x),
                                                N = as.numeric(.x)))) %>% 
  # calculate IBR
  mutate(poisson_ibr = map(poisson_sad, ~rarefaction(x = .x$N,
                                                     method = 'IBR')),
         # wrangle for plotting
         poisson_ibr = map(poisson_ibr, ~tibble(individuals = 1:length(.x),
                                                expected_richness = as.numeric(.x)))) 

# examine abundance, richness and evenness for the different communities
metrics <- communities %>% 
  select(sim, region, data, poisson_sad) %>% 
  unnest(c(poisson_sad)) %>% 
  group_by(sim, region) %>% 
  summarise(J = sum(N),
            S = calc_div(N, index = 'S'),
            PIE = calc_div(N, index = 'PIE'),
            S_PIE = calc_div(N, index = 'S_PIE')) %>% 
  ungroup() %>% 
  # put the known parameters back in for visualisation
  left_join(community_params)
# plot ibr
communities %>% 
  unnest(c(data, poisson_ibr)) %>% 
  select(sim, region, individuals, expected_richness) %>% 
  mutate(spatial_dist = 'random') %>% 
  left_join(community_params, by = 'region') %>% 
  mutate(region_label = paste0('S = ',Spool,
                               ', J = ', J,
                               ', sd = ', sdlog)) %>% 
  ggplot() +
  facet_wrap(~region_label, scales = 'free') + 
  geom_line(aes(x = individuals, y = expected_richness, 
                group = sim)) +
  labs(y = 'Expected number of species',
       x = 'Number of individuals') +
  theme(legend.position = 'none')

# plot metrics
ggplot() +
  geom_boxplot(data = metrics,
               aes(x = as.factor(J), y = S,
                   group = region, colour = as.factor(sdlog))) +
  scale_colour_manual(name = 'Evenness',
                      values = c('0.8' = '#ffa600',
                                  '2' = '#003f5c'),
                      labels = c('0.8' = 'more even',
                                 '2' = 'less even')) +
  labs(y = 'Species richness',
       x = 'Number of individuals')

But, scale.

Unfortunately, our simplification to three components is only useful for biodiversity accounting at one scale. The shape of the rarefaction curve is again a useful, albeit limited, abstraction of the difficulty ahead: biodiversity scales non-linearly [refs]. The individual-based rarefaction curve is often limited because with sufficient sampling effort on the x-axis, the curve cab bend upwards again (e.g., triphasic SAR, (Rosenzweig 1995). The nonlinear scaling of biodiversity is well known, but not necessarily well understood, particularly the potential for non-linear scaling to impact estimates of biodiversity change at different scales. One way to approach the non-linear scaling is via autocorrelation or within species aggregation (McGill 2011a).

Here we will approach the scale-dependence of biodiversity using an approach introduced by Whittaker (1960) that considers two discrete spatial scales and another to relate the two. Specifically, the larger (sometimes referred to as the regional) scale is often called the gamma (\(\gamma\)) scale, and the smaller (local) scale is called the alpha (\(\alpha\)) scale (and the mean \(\alpha\)-diversity is \(\bar{\alpha}\)). Whittaker’s diversity partition assumes a multiplicative relationship between the two scales:

\[\gamma = \bar{\alpha} * \beta,\] where \(\beta\)-diversity describes the variation among samples (or sites or time points) in species composition.

By making scale discrete, we can continue to use the individual-based rarefaction to visualise and describe biodiversity. We’ll consider each sample as coming from the \(\alpha\)-scale. And then combine multiple samples to create the \(\gamma\)-scale.

# set a RNG seed 
set.seed(55)

# simulate a community with 200 species and 5000 individuals, and 
# a lognormal SAD, and individuals randomly distributed in space
Spool <- 200
Jpool <- 5000
sim_poisson_comm <- sim_poisson_community(s_pool = Spool,
                                          n_sim = Jpool,
                                          sad_type = 'lnorm',
                                          sad_coef = list('meanlog' = log(Spool/Jpool),
                                                          'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>% 
  as_tibble()

# simulate quadrat sampling from the community
quad_area <- 0.01

samps <- mobsim::sample_quadrats(comm = sim_poisson_comm,
                n_quadrats = 5, 
                quadrat_area = quad_area,
                method = 'random',
                avoid_overlap = FALSE,
                plot = FALSE)

quad_xy <- samps$xy_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site')

# plot samples
plot_samples <- ggplot() +
  geom_point(data = comm1,
             aes(x = x, y = y, colour = species)) +
  geom_rect(data = quad_xy,
            aes(xmin = x, ymin = y,
                xmax = x + sqrt(quad_area),
                ymax = y + sqrt(quad_area)),
            fill = alpha('grey',0),
                colour = '#ff0000',
            linewidth = 1.1) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = 'black'))


# need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).
alpha_long <- samps$spec_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site') %>% 
  pivot_longer(cols = !site,
               names_to = 'species',
               values_to = 'N')

# calculate IBR, and put it in a tibble (dataframe) for plotting  
alpha_ibr <- alpha_long %>% 
  group_by(site) %>% 
  nest(data = c(species, N)) %>% 
  mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>% 
  unnest(ibr) %>% 
  mutate(individuals = 1:n()) %>% 
  ungroup()

# to get the gamma-scale curve, we need to aggregate the species counts
# across species
gamma_long <- alpha_long %>% 
  group_by(species) %>% 
  summarise(N = sum(N)) %>% 
  ungroup()

gamma_ibr <- tibble(gamma_richness = rarefaction(gamma_long$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# plot rarefaction curves
ibr_2scales <- ggplot() +
  geom_line(data = alpha_ibr,
            aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
  geom_line(data = gamma_ibr,
            aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
  scale_colour_manual(name = 'Scale',
                      values = c('alpha' = '#025c00',
                                 'gamma' = '#ffc619'),
                      labels = c(expression(alpha),
                                 expression(gamma))) +
  labs(x = 'Number of individuals',
       y = 'Expected number of species') +
  theme_bw() +
  theme(legend.position = c(0.99,0.01),
        legend.justification = c(1,0))

plot_grid(plot_samples,
          ibr_2scales,
          nrow = 1)

We can use Whittaker’s diversity partition to calculate \(\beta\)-diversity for these data (i.e., \(\beta = \frac{\gamma}{\bar{\alpha}}\). Before doing so, what does it means when the \(\alpha\)-scale curves are so closely aligned with the \(\gamma\)-scale curve? The overlapping rarefaction curves means that the smaller (\(\alpha\)) scale samples are effectively a random subsample of the larger (\(\gamma\)) scale. This makes sense, because we know that the individuals of all species were distributed randomly in space. Before diving into some beta-diversity metrics, let’s look at how within species aggregation changes what we see.

# set a RNG seed 
set.seed(55)

# simulate a community with 200 species and 5000 individuals, and 
# a lognormal SAD, and individuals randomly distributed in space
Spool <- 200
Jpool <- 5000
sim_aggr_comm <- sim_thomas_community(s_pool = Spool,
                                      n_sim = Jpool,
                                      sad_type = 'lnorm',
                                      sad_coef = list('meanlog' = log(Spool/Jpool),
                                                      'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm2 <- sim_aggr_comm$census %>% 
  as_tibble()

# simulate quadrat sampling from the community
quad_area <- 0.01

samps2 <- mobsim::sample_quadrats(comm = sim_aggr_comm,
                n_quadrats = 5, 
                quadrat_area = quad_area,
                method = 'random',
                avoid_overlap = FALSE,
                plot = FALSE)

quad_xy <- samps2$xy_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site')

# plot samples
plot_samples <- ggplot() +
  geom_point(data = comm2,
             aes(x = x, y = y, colour = species)) +
  geom_rect(data = quad_xy,
            aes(xmin = x, ymin = y,
                xmax = x + sqrt(quad_area),
                ymax = y + sqrt(quad_area)),
            fill = alpha('grey',0),
                colour = '#ff0000',
            linewidth = 1.1) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = 'black'))


# need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).
alpha_long2 <- samps2$spec_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site') %>% 
  pivot_longer(cols = !site,
               names_to = 'species',
               values_to = 'N')

# calculate IBR, and put it in a tibble (dataframe) for plotting  
alpha_ibr2 <- alpha_long2 %>% 
  group_by(site) %>% 
  nest(data = c(species, N)) %>% 
  mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>% 
  unnest(ibr) %>% 
  mutate(individuals = 1:n()) %>% 
  ungroup()

# to get the gamma-scale curve, we need to aggregate the species counts
# across species
gamma_long2 <- alpha_long2 %>% 
  group_by(species) %>% 
  summarise(N = sum(N)) %>% 
  ungroup()

gamma_ibr2 <- tibble(gamma_richness = rarefaction(gamma_long2$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# plot rarefaction curves
ibr_2scales2 <- ggplot() +
  geom_line(data = alpha_ibr2,
            aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
  geom_line(data = gamma_ibr2,
            aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
  scale_colour_manual(name = 'Scale',
                      values = c('alpha' = '#025c00',
                                 'gamma' = '#ffc619'),
                      labels = c(expression(alpha),
                                 expression(gamma))) +
  labs(x = 'Number of individuals',
       y = 'Expected number of species') +
  theme_bw() +
  theme(legend.position = c(0.99,0.01),
        legend.justification = c(1,0))

plot_grid(plot_samples,
          ibr_2scales2,
          nrow = 1)

References

Hill, M. O. 1973. “Diversity and Evenness: A Unifying Notation and Its Consequences.” Ecology 54 (2): 427–32.
Hurlbert, Stuart H. 1971. “The Nonconcept of Species Diversity: A Critique and Alternative Parameters.” Ecology 52 (4): 577–86.
Jost, L. 2006. “Entropy and Diversity.” Oikos 113 (2): 363–75.
Jost, Lou. 2007. “Partitioning Diversity into Independent Alpha and Beta Components.” Ecology 88 (10): 2427–39.
McGill, Brian J. 2011a. “Linking Biodiversity Patterns by Autocorrelated Random Sampling.” American Journal of Botany 98 (3): 481–502.
———. 2011b. “Species Abundance Distributions.” In Biological Diversity: Frontiers In Measurement & Assessment (Eds Magurran, A. & McGill, B.), 105–22. Oxford University Press.
Olszewski, Thomas D. 2004. “A Unified Mathematical Framework for the Measurement of Richness and Evenness Within and Among Multiple Communities.” Oikos 104 (2): 377–87.
Rosenzweig, Michael L. 1995. Species Diversity in Space and Time. Cambridge University Press.
Whittaker, Robert Harding. 1960. “Vegetation of the Siskiyou Mountains, Oregon and California.” Ecological Monographs 30 (3): 279–338.